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AUTOMATED SEGMENT MATCHING ALGORITHM 
THEORY, TEST, AND EVALUATION 

1. INTRODUCTION 


1.1 BACKGROUND 


The work reported here was carried out as part of the AgRISTARS 
Domestic Crops and Land Cover project (DCLC) Scene-to-Map Registration 
Task. It.? objective of the task was to develop an algorithm that would 
automate the United States Department of Agriculture/Statistical 
Reporting Service (USDA/SRS) process of segment shifting. 

Much of the USDA/SRS crop area estimation approach depends on a set of 
sample segments for developing crop signatures from Landsat data. The 
information from the spectral data is used with ground truth data to 
develop regression estimators. The registration of the sample segments 
to the raw Landsat data is an essential step for minimizing the mean 
square errors from the regression estimation process. 

Currently, an initial registration of the segment data is obtained 
using a least-squares fit based on selected control points. The 
initial registration gives an adequate fit on a global basis, but on a 
local basis more precision can be obtained. The USDA/SRS presently 
accomplishes this by manually shifting the segments in the locality of 
the initial gross registration until a '•good" fit can be visually 
detected. 


1.2 USDA/SRS SEGMENT SHIFTING PROCEDURE 


Once the segments are digitized from either aerial photographs or 
topographic maps, the coordinates are converted to UTM using 


1 - 


coefficients from a segment network file (Ozga, et.al. 1977). The UTM 
coordinates are transformed to latitude and longitude and then to 
Landsat lines and columns using mapping coefficients from a segment 
calibration file. Hard copies of the digitized segment boundaries are 
obtained from a printer, as well as grey level prints of the Landsat 
imagery, usually bands 5 and 7. The print of the registered segment 
boundary is overlaid to the Landsat print at the location of the 
initial registration. The boundary plot is then shifted around until a 
better fit is found. The new fit is recorded as the shift necessary to 
correct the original registration. For example, the segment boundary 
location may have to be shifted one column to the left and two lines 
up. This shifting process is carried out for all sample segments 
before any Landsat data is processed for spectral signature 
development. 

The registration errors are assumed to be pure translation errors by 
the USDA and were treated as such in this study. For this reason, the 
process is restricted to shifting in the row and/or column directions. 

An example of a segment boundary plot overlaid to raw Landsat data is 
illustrated in Figure 1 . This picture shows the initial registration 
of the segment boundary to the raw data. The poor correlation between 
the two images is readily apparent. Figure 2 illustrates the segment 
boundary location for the same segment after it has been shifted. This 
particular segment required a shift of -1 rows and -3 columns from the 
original registration to locate it correctly. 
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ORIGINAL REGISTRATION 


F2GURE 1 


Figure 1. initial Registration of 

Segnent Boundary to Raw Data 
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AFTER SHIFTING (ASMA) 
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2. THE AUTOMATED SEGMENT MATCHING ALGORITHM (ASMA) 


2.1 SEGMENT DATA PREPARATION 

2.1.1 INITIAL REGISTRATION OF SEGMENTS 

The shifting program requires the initial registration of the segment 
as a reference point, as does the manual process. Therefore, the same 
coefficients as described in Section 1.2 are used by the program to 
convert digitizer coordinates to UTM, then to latitude and longitude 
before they are converted to Landsat lines and elements. Once the 
segment vertices are in' Landsat coordinates, the segments can be 
reconstructed on a grid, 

2.1.2 ’RESAMPLING* OF SEGMENT BOUNDARIES TO LANDSAT DATA 

It was decided at the onset of this study to work on a quarter-pixel 
resolution cell size (Graham, 1981). This cell size was chosen in 
order to work with half row and half column precision. The original 
USDA objectives required that the algorithm be correct within a half 
pixel. Therefore, before resampling the segment boundary to the 
Landsat reference, the segment line and element vertices are doubled. 
(The program works on an array which is twice as long and twice as wide 
as the original input cell array.) 

A grid is constructed with each cell side representing one half pixel. 
The program then interpolates between vertices to obtain the cells to 
which the boundaries are remapped. 

2.1.3 THE RECONSTRUCTED SEGMENT GRID 


The array containing the resampled segment boundaries is mapped as 
follows: a 1 represents a boundary, a 0 represents points outside the 
segment, and the remainder of numbers represent field numbers. A 1 has 
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been added to the field numbers in the program to distinguish a field 
value of 1 from a boundary. These values are. stored in memory during 
run-time. The program numbers the field after the boundaries are 
constructed. An example of a reconstructed grid is shown in Figure 3. 
(The field numbers are not shown.) 

2.2 LANDSAT DATA PREPARATION 


The Land3at bands 5 and 7 data is extracted for the area encompassing 

the segment with an additional 5 pixels on each side. These padding 

pixels define the area in which the segment is to be shifted. It was 

decided to use 5 pixels because the registration errors were never 

1 

larger than 5 pixels in the sample data sets. This area differs from 
the 10 pixel shifting area discussed by Graham (1981); using the 5 
pixel shifting area also decreases the size of the program as well as 
run time. 

2.2.1 EDGE ENHANCEMENT OF LANDSAT DATA 


The segment shifting program uses edge enhanced data to locate the 
segment boundaries. The data is transformed to a gradient image using 
the same equations given by Graham (1981). These are given by: 
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1. For a 57m resolution, this corresponds to 285 meters. 
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Where: 

X hi is the Landsat reflectance value for location L^, band i. 
h = 0, 1, 2, 3 denotes location, illustrated in Figure 4a below, 
i s bands 5 and 7 of Landsat, 



1*2 

L o 

* 

l 3 


Figure 4a. Original Cell Locations 


The values in (1), (2), and (3) are output to an expanded grid, Figure 
4b. This grid represents the quarter pixel cell size discussed in 
Section 2.1.2, The original cell locations; from Figure 4a, are shown 
in their new location. 
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Figure 4b. New Cell Locations 




The g values are set bo 10 if their computed value is greater than 10. 
This saturation value is used to prevent extremely large values of the 
gradient from masking out the effect of more subtle but significant 
changes in land cover. This effect will become more readily apparent 
in the ensuing discussion of the algorithm. 

The above equations .'diow the output values based on (0) as the pivot. 
The algorithm slides this 2x2 window to the next pixel, and 
computes (1), (2), and (3) with it as the pivot. The process is 
carried out for all pixels in the search area. The original cell 
locations ore not assigned values by this process; they are assigned 
the mean value from the neighboring 8 cells (or 5 cells at the file 
edge) . 

This gradient image is used by the algorithm for the segment shifting 
and statistics computations. The raw Land3at data is no longer 
required after this point. 


2.3 THE SEGMENT SHIFTING AND MATCHING PROCEDURE 


2.3.1 SHIFTING THE SEGMENTS 


The segment array is indexed to the gradient array at the location of 
the initial registration. The gradient values are summed along the 
boundaries within and around the segment. That is, the boundary file 
is used as a mask into the gradient file. This can be stated 
mathematically as follows: 
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Let 

b ij 

a 1 for a cell boundary 

a 0 elsewhere 

and, 

hi 

* cell gradient 

then , 

hi 

= ij b ij * g ij 


Where 3 ^ is the matching coefficient. The value s^j is standardized 
to a mean of 0, and standard deviation of 1. This is done to obtain a 
relative matching coefficient at each cell. The segment boundary is 
shifted over the entire search area and an s value computed at each 
shift. 

The s value is directly proportional to the amount of agreement between 
the two images. Therefore, the maximum 3 value is taken as the best 
match and the corresponding shift is recorded. The shift is accepted 
only if s is above a certain threshold. 

An empirical study was performed on several data set3 to determine a 
threshold value against which to compare s, Thi3 study used histograms 
of the s values for all segments within a given data set. The shifts 
from the algorithm were determined to be either matches or non-matches 
by comparing the algorithm results to the manual shifting (USDA/SRS) 
results. The histogram, Figure 5, shows the distribution of s values 
for the accepted and unaccepted shifts. The RMS errors were computed 
for six data sets using s values from 3.2 to 3.6 in order to find a 
threshold value for the first rcage acceptance test. The optimum value 
for the threshold, 3.^, was for the shift having the smallest RMS 
error. Any segment shifts with an s value less than 2.0 are discarded 
since these shifts do not appear to be signifioantly different from 
other shifts in the sample. 
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Standardized S Values i 

i 

<^2.0 Discarded 
2. 0-3. 4 Enter Second Stage 

^3.4 Accepted in First Stage 


e 5. Bar Histogram Representation of Standardized j 

S-Values for Sample of 30 Segments j 



2.3.2 THE SECOND STAGE TEST 


Those segments with shifts not meeting the required threshold value on 
the first stage test are automatically entered into a seoond stage 
test. The second stage test examines the homogeneity of pixels within 
field boundaries. The program uses the gradient output array to obtain 
a measure of within field variability for each shift. Basically, the 
gradients represent how similar adjacent pixels are; so by taking the 
sum of the squared gradients within each field, a measure of 
homogeneity is obtained. 


Let: g^ = gradient value for pixel i, j, in field k. 


y 2 

d = . L , g. . /n Dispersion for field k; n is 

K 1 f J 1 J K 

number of non-border pixels 
in field. 


(5) 


d 


s “ 


£ 

k 


dispersion value for shift s^. (6) 


The second stage test uses the following statistic to decide which 
shift is best: 

v = max 
L 

The ratio of the s value from the first stage to the d value is 

3 

maximized. This value gives the shift with the largest gradient along 
the boundaries, relative to the dispersion within the boundaries. The 
set L is the set of all segment shifts from the first stage which had s 
values in the 2.0-3. 4 range. 

I 

• The shift value recorded from the second stage is that value of the 

, - shift associated with v. 

(I 

:! 

jj " 
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2.4 FINAL ACCEPTANCE TEST 


Some criteria must be used to determine if the results of the shifting 
process are acceptable. The criteria used in ASMA is based on a 
technique outlined by Graham (1981). 

The acceptance test is based on the assumption that the shifts accepted 
in the first stage test make up a sample of a population of 'reliable' 
shift values (based on s value). 

By constructing a 'confidence interval* around the mean of this 
population an acceptance region is established for shift values from 
the second stage test. The current test is to accept the second stage 
shift if the shift value for both the row and column is within: 


1 t 1.7 Oy (8) 

where Y is the mean shift from the first stage and y is the standard 
deviation. 

In summary, all values from the first stage test are accepted if they 
are greater than 3.4, otherwise, the value from the second stage test 
is accepted if they are in the acceptance interval (8). A flow diagram 
of ASMA is shown in Figure 6. 


3. TEST AND EVALUATION 


The algorithm developed by Graham and further refined here was based on 
a Landsat 2 Kansas data set (21980 - 16264) from 1981. The algorithm 
was tested and further refined on another set of 5 Landsat Scenes 
(Table 3.1). 
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Figure 6, Flow Diagram of ASMA 
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Site 

I.D. 

Scene No. 

No. Segments 

Kansas 

(1) 

21980 - 16264 

20 


(2) 

22287 - 16313 

39 

Missouri 

(1) 

22370 - 15502 

9 


(2) 

22370 - 15504 

8 


(3) 

22371 - 15560 

29 


(4) 

22371 - 15563 

16 



4 

121 TOTAL 1 


Table 3.1 - Set of Six Scenes 
Used for Testing 


3.1 ERROR ANALYSIS 


Manual shifting results were obtained from USDA/SRS on all 6 scenes. 
There were at least 2 estimates (of the row and column shift numbers) 
obtained from 5 of the scenes in order to estimate the repeatability. 
The manual shifting results are listed in Appendix B. The mean shift 
value was used as an estimate of the true value in order to evaluate 
the algorithm results. Table 3.2 gives the results of the manual 
shifting error analysis. This is giv< n by: 



n 

E / ,2 

i=l (x ] - x 2 ) i 

2 (n - 1) 


(9) 


where and Xg correspond to the independent estimates by persons 1 
and 2, and n is the number of segments. 
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Scene 

Row 

Col umn 

Kansas 



(1) 

.41 

.33 

Missouri 



(1) 

.23 

.26 

(2) 

.2° 

.25 

(3) 

.45 

.30 

(4) 

.13 

.32 


Table 3.2 - ct For Manual Shifting Results (In Pixels) 

G 


The results from the segment shifting algorithm are listed in Appendix 
A. A summary of the error analysis is given (in meters) in Table 3.3. 
The root mean square (rms) error is given by: 


A 


a 


r 



(10) 


where: 


-2 n 2/ 

a X " 1 2 1 (X rT X i> 


(ID 


And: 


X . 


r l 


the ASMA shift for segment i, in the column direction. 


= the mean manual shift for segment i 


n = number of segments 


Similarly, Y ^ and are computed for the row direction. 



3.2 DISCUSSION OF RESULTS 


The initial aim to get results with half-pixel (28.5m) accuracy was met 
in most cases. Referring to Table 3.3, the 28.5 meter requirement was 
met in all cases for the row RMS and in 4 out of 6 cases for the column 
RMS. The overall RMS errors, 18.89m and 25.23m did meet the 
requirement. The total RMS error, calculated as: 

S' 0T ■ < si + sj ) h (i2) 

was also with the 40.30 meters required. 

Of the total 121 segments, 90 were accepted. This gives an acceptance 
rate of about 74.4?. 

The RMS error is proportional to the number of segments accepted, so 
the more segments accepted, the larger the RMS errors. Some work was 
done in trying to optimize the acceptance region and the 1.7 used was a 
rough approximation to a Z value for a 90? confidence region. There is 
no assumption made about the distribution of the shift numbers. The 
value used may also be made optional. 

Several iterations of the algorithm jhowed that when Z was less than 
1.7 too many segments were not accepted and at larger values, too many 
were accepted — inflating the RMS errors. 

The results indicated also that ASMA had a slightly greater shift than 
the average manual shift, Table 3.3. This result had no immediate 
significance and may disappear with further testing of the algorithm. 
It was noticed in analyzing the results that the shifts were almost 
always negative, that is, the shift was usually up and to the left. 
This fact may just be an anomaly of the USDA/SRS registration procedure 
or peculiar to the areas of study, i.e., Kansas and Missouri. 
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Table 3.3 - RESULTS OF ASMA ERROR ANALYSIS BASED ON USDA/SRS MANUAL SHIFTING ESTIMATES 




The correlation between the ASHA results and manual results was also 
examined. The correlations (r) are shown in Table 3.4. 


Site 

Row Shift 
r 

Column Shift 
r 

Sample Size 
n 

Kansas (1) 

.897 

.929 

18 

(2) 

.793 

.932 

30 

Missouri (V 


.858 

.962 

6 

(2) 


.224 

.506 

6 

( 3 i 


.946 

.924 

16 

(4 


.977 

.947 

12 


Table 3.4 - Correlations between manual and ASMA results 


In all cases, except Missouri (2), the correlations were high. The 
Missouri (2) scene was also the one with the largest RMS errors; the 
column shifts not being within the accepted half pixel accuracy. This 
scene gave some problems earlier in the development of the first 
acceptance test, in that only 1 of the 8 segments were accepted. The 
test was made less stringent and thus 6 of the 8 segments were 
accepted. The results from this scene may be due to one segment in 
particular., or due to the small sample size (8). When segment 6344 is 
discarded, the RMS errors become 23.357, 24.682, and 35.386 meters for 
row, column, and total, respectively. These errors then became 
acceptable. 

This particular scene Missouri (2), had only 4 segments making up the 
statistics for the final acceptance test. This is perhaps a limitation 
of the program and something which could merit further study. It is 
also deemable that the segments from this scene did not lend themselves 
to the characteristics making the other scenes successful, i.e., good 
boundary delineation. Segment 6344 is shown in Figure 7. 




FIGURE 7a FIGURE 7b FIGURE 7c 




All in all, the algorithm gave relatively good results. Some of the 
segments and their shifted boundaries are shown in Figures 4 to 11. 
Segment 6450, Figure 1 and 2 gave good results in the area of fields 
A1, A4 and A5. The narrow area to the right of these fields was found 
to fit quite well after shifting by ASMA. Figure 8 shows a segment 
which not only matched up well but also gave the exact same shift as 
the USDA/'SRS manual shift. 

Figure 9 is a good example of a confusion segment. This segment, 
incidentally, had the highest differece in shift from the USDA/SRS 
estimate [See Appendix A - segment 7150, Kansas (2)]. Tho ASMA shift 
is shown in Figure 9b and the USDA/SRS shift is shown in Figure 9c. 
The confusion lies in the fact that the USDA/SRS fit looks better but 
at the same time, Fields 02 and, C3 are both winter wheat. The ASMA 
shift shows C2 and C3 as being alike. The USDA/SRS shift shows them as 
different. D1 and D2 are also winter wheat (on the crop code list). 
These show up as being two dark fields in Figure 9 as opposed to the 
two light fields, C2 and C3, also representing winter wheat. This 
segment could probably have been eliminated from the error analysis 
because of the confusion but was kept instead in order not to bias the 
results (since not all segments were visually checked this closely). 

Figures 10 a,b show another segment which was matched quite well and 
which also resulted in the sanr shift as the USDA/SRS. Figures 11 
a,b,c show another good segment shift in which ASMA was a quarter-pixel 
off from the manual estimate (This difference is really too small to 
see in the figures attached.) In this case, ASMA happened to agree 
exactly with one USDA/SRS estimate, but was different from the average 
of the two USDA/SRS estimates. 


3.3 RECOMMENDATION 


More data sets need to be tested in order to further evaluate the 
algorithm. Also, some further research should be done oss the final 
acceptance test. It appears that for scenes with large samples of 
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Figure 8. Hatching Segment Which Gives Same Shift As Manual Shift 
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FIGURE 9b 


Figure 9. Example of a Confusion Segment 


FIGURE 9c 


FIGURE 9a 


AFTER SHIFTING (USDA) 







ORIGINAL REGISTRATION - USDA AFTER SHIFTING- ASMA 



FIGURE 11c 








segment3, the acceptance test is well established. When the number of 
segments is small the acceptance test is weak since there are not 
enough segments to construct a good confidence interval. In this case, 
the user may want to 3 hift those few segments manually. The overall 
results are within the half pixel accuracy requirement and the 
algorithm is thus recommended for use with scenes with large samples of 
segments. 


4. CONCLUDING REMARKS 

The segment shifting algorithm was developed on a Perkin-Elmer 3242 
32-bit minicomputer. The CPU time is listed in Table 4.1 for the six 
data sets tested. 


COMPUTER RUN TIMES 



Table 4.1 

<! 

j 

n 

I The ASMA program has been put in the USDA/SRS EDITOR System (OZGA etal 

1977). An initial USDA/SRS evaluation of the algorithm was performed 
| on the Kansas (2) scene with 39 segments. As far as actual results, 

there were some minor differences in three or four of the 36 segments 
actually shifted. Segment 7150 resulted in a shift of -1.5 rows and 

ij 

i -1.0 columns after ASMA '/as run through EDITOR. This result is for the 
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confusion segment discussed in section 3.2, Figures 9 a, b, c. It is 
not certain why the result is different, particularly for this segment, 
but differences in machine roundoff may account for part of the 
problem. The rest of the segments matched exactly except for 2 which 
were half a pixel off. 
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